FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

نویسندگان

Da Zheng

Disa Mhembere

Randal C. Burns

Joshua T. Vogelstein

Carey E. Priebe

Alexander S. Szalay

چکیده

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlapping computation with I/O. To save I/O bandwidth, FlashGraph only accesses edge lists requested by applications from SSDs; to increase I/O throughput and reduce CPU overhead for I/O, it conservatively merges I/O requests. These designs maximize performance for applications with different I/O characteristics. FlashGraph exposes a general and flexible vertex-centric programming interface that can express a wide variety of graph algorithms and their optimizations. We demonstrate that FlashGraph in semi-external memory performs many algorithms with performance up to 80% of its in-memory implementation and significantly outperforms PowerGraph, a popular distributed in-memory graph engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BigSparse: High-performance external graph analytics

We present BigSparse, a fully external graph analytics system that picks up where semi-external systems like FlashGraph and X-Stream, which only store vertex data in memory, left off. BigSparse stores both edge and vertex data in an array of SSDs and avoids random updates to the vertex data, by first logging the vertex updates and then sorting the log to sequentialize accesses to the SSDs. This...

متن کامل

An SSD-based eigensolver for spectral analysis on billion-node graphs

Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi e...

متن کامل

FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs

FlashMatrix is a matrix-oriented programming framework for general data analysis with high-level functional programming interface. It scales matrix operations beyond memory capacity by utilizing solid-state drives (SSDs) in non-uniform memory architecture (NUMA). It provides a small number of generalized matrix operations (GenOps) and reimplements a large number of matrix operations in the R fr...

متن کامل

$n$-Array Jacobson graphs

We generalize the notion of Jacobson graphs into $n$-array columns called $n$-array Jacobson graphs and determine their connectivities and diameters. Also, we will study forbidden structures of these graphs and determine when an $n$-array Jacobson graph is planar, outer planar, projective, perfect or domination perfect.

متن کامل

Estimating graph distance and centrality on shared nothing architectures

We present a parallel toolkit for pairwise distance computation in massive networks. Computing the exact shortest paths between a large number of vertices is a costly operation, and serial algorithms are not practical for billion-scale graphs. We first describe an efficient parallel method to solve the single source shortest path problem on commodity hardware with no shared memory. Using it as ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

نویسندگان

چکیده

منابع مشابه

BigSparse: High-performance external graph analytics

An SSD-based eigensolver for spectral analysis on billion-node graphs

FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs

$n$-Array Jacobson graphs

Estimating graph distance and centrality on shared nothing architectures

عنوان ژورنال:

اشتراک گذاری